Generalized Ewens–Pitman model for Bayesian clustering

نویسنده

  • HARRY CRANE
چکیده

We propose a Bayesian method for clustering from discrete data structures that commonly arise in genetics and other applications. This method is equivariant with respect to relabelling units; unsampled units do not interfere with sampled data; and missing data do not hinder inference. Cluster inference using the posterior mode performs well on simulated and real datasets, and the posterior predictive distribution enables supervised learning based on a partial clustering of the sample.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Markov random field-regulated Pitman-Yor process prior for spatially constrained data clustering

In this work, we propose a Markov random field-regulated Pitman–Yor process (MRF-PYP) prior for nonparametric clustering of data with spatial interdependencies. The MRF-PYP is constructed by imposing a Pitman–Yor process over the distribution of the latent variables that allocate data points to clusters (model states), the discount hyperparameter of which is regulated by an additionally postula...

متن کامل

Consistency in Latent Allocation Models

A probabilistic formulation for latent allocation models was introduced in the machine learning literature by Blei et al. (2003) in the study of a corpora of documents. This article addresses the consistency properties of various posterior probabilities on the space of latent allocations, focusing on the “bag of words” model. It is shown that the Latent Dirichlet Allocation and Ewens-Pitman pri...

متن کامل

Regeneration in random combinatorial structures

Theory of Kingman’s partition structures has two culminating points • the general paintbox representation, relating finite partitions to hypothetical infinite populations via a natural sampling procedure, • a central example of the theory: the Ewens-Pitman two-parameter partitions. In these notes we further develop the theory by • passing to structures enriched by the order on the collection of...

متن کامل

Long-run Behavior of Macroeconomic Models with Heterogeneous Agents: Asymptotic Behavior of One- and Two-Parameter Poisson-Dirichlet Distributions

This paper discusses asymptotic behavior of oneand two-parameter PoissonDirichlet models, that is, Ewens models and its two parameter extensions by Pitman, and show that their asymptotic behavior are very different. The paper shows asymptotic properties of a class of oneand twoparameter Poisson-Dirichlet distribution models are drastically different. Convergence behavior is expressed in terms o...

متن کامل

Bayesian Inference for Spatial Beta Generalized Linear Mixed Models

In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014